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Abstract. In this contribution, we present new algorithms to source separation for the case of noisy 
instantaneous linear mixture, within the Bayesian statistical framework. The source distribution 
prior is modeled by a mixture of Gaussians [1] and the mixing matrix elements distributions by 
a Gaussian [2]. We model the mixture of Gaussians hierarchically by mean of hidden variables 
representing the labels of the mixture. Then, we consider the joint a posteriori distribution of 
sources, mixing matrix elements, labels of the mixture and other parameters of the mixture with 
appropriate prior probability laws to eliminate degeneracy of the likelihood function of variance 
parameters and we propose two iterative algorithms to estimate jointly sources, mixing matrix and 
hyperparameters: Joint MAP (Maximum a posteriori) algorithm and penalized EM algorithm. The 
illustrative example is taken in [3] to compare with other algorithms proposed in literature. 

PROBLEM DESCRIPTION 

We consider a linear instantaneous mixture of n sources. Observations could be cor- 
rupted by an additive noise. This noise may represent measurement errors or model 
incertainty: 

x{t) = As{t) + t{t), t = l,..,T (1) 

where x(t) is the (m x 1) measurement vector, s(t) is the (n x 1) source vector which 
components have to be separated, A is the mixing matrix of dimension (m x n) and e(t) 
represents noise affecting the measurements. We assume that the (m x T) noise matrix 
e{t) is statistically independant of sources, centered, white and Gaussian with known 
variance cr^ /. We note si,,t the matrix n x T of sources and xi,,t the matrix m x T of 
data. 

Source separation problem consists of two sub-problems: Sources restoration and 
mixing matrix identification. Therefore, three directions can be followed: 

1. Supervised learning: Identify A knowing a training sequence of sources s, then use 
it to reconstruct the sources. 

2. Unsupervised learning: Identify A directly from a part or the whole observations 
and then use it to recover s. 



3. Unsupervised joint estimation: Estimate jointly s and A 

In the following, we investigate the third direction. This choice is motivated by practical 

cases where sources and mixing matrix are unknown. 

This paper is organised as follows: We begin in section II by proposing a Bayesian 
approach to source separation. We set up the notations, present the prior laws of the 
sources and the mixing matrix elements and present the joint MAP estimation algorithm 
assuming known hyperparameters. We introduce, in section III, a hierarchical modelisa- 
tion of the sources by mean of hidden variables representing the labels of the mixture of 
Gaussians in the prior modeling and present a version of JMAP using the estimation of 
these hidden variables (classification) as an intermediate step. In both algorithms, we as- 
sumed known the hyperparameters which is not realistic in applications. That is why, in 
section IV, we present an original method for the estimation of hyperparameters which 
takes advantages of using this hierarchical modeling. Finally, since EM algorithm has 
been used extensively in source separation [4], we considered this algorithm and pro- 
pose, in section V, a penalized version of the EM algorithm for source separation. This 
penalization of the likelihood function is necessary to eliminate its degeneracy when 
some variances of Gaussian mixture approche zero [5]. Each section is supported by 
one typical simulation result and partial conclusion. At the end, we compare the two last 
algorithms. 



BAYESIAN APPROACH TO SOURCE SEPARATION 

Given the observations cci. t, the joint a posteriori distribution of unknown variables 
Si..T and A is: 

p{A,Si„t\xi„t) (xp{xi,.t\A,Si.,t) p{A)p{si.,t) (2) 

where p{A) and are the prior distributions through which we modelise our a 

priori information about sources s and mixing matrix A. p{xi„t\A,Si.,t) is the joint 
likelihood distribution. We have, now, three directions: 

1 . First, integrate (2) with respect to Si..r to obtain the marginal in A and then estimate 
A by: 

A — a,Tgmeix{J{A)=lnp{A\xi„T)} (3) 

A 

2. Second, integrate (2) with respect to A to obtain the marginal in si„t and then 
estimate Si. t by: 

Si..r = argmax{J(si..T) = lnp(si..T|a;i..T)} (4) 

31. .T 

3. Third, estimate jointly Si..t and A: 

(A,Si..t) = argmax{J(A,Si..T) =lnp{A,Si„T\xi..T)} (5) 

{A,si..t) 



Choice of a priori distributions 



The a priori distribution reflects our knowledge concerning the parameter to be 
estimated. Therefore, it must be neither very specific to a particular problem nor too 
general (uniform) and non informative. A parametric model for these distributions seems 
to fit this goal: Its stucture expresses the particularity of the problem and its parameters 
allow a certain flexibility. 

Sources a priori: For sources s, we choose a mixture of Gaussians [1]: 

P{sj) = ^ar^^{mji,(J%), j = ^-n (6) 
1=1 

Hyperparameters Qj are supposed to be known. 
This choice was motivated by the following points: 

• It represents a general class of distributions and is convenient in many digital 
communications and image processing applications. 

• For a Gaussian likelihood p(a;i T|si..T,^) (considered as a function of Si. t), the 
a posteriori law remains in the same class (conjugate prior). We then have only to 
update the parameters of the mixture with the data. 

Mixing matrix a priori: To account for some model uncertainty, we assign a Gaussian 
prior law to each element of the mixing matrix A: 

p{A,)=X{M,,,al^) (7) 

which can be interpreted as knowing every element (Mji) with some uncertainty ((7a,ij)- 
We underline here the advantage of estimating the mixing matrix A and not a separating 
matrix B (inverse of A) which is the case of almost all the existing methods for source 
separation (see for example [6]). This approach has at least two advantages: (i) A does 
not need to be invertible (n 7^ m), (ii) naturally, we have some a priori information on 
the mixing matrix not on its inverse which may not exist. 



JMAP algorithm 

We propose an alternating iterative algorithm to estimate jointly Si. t and A by 
extremizing the log-posterior distribution: 

^L.T = a.rgma.x^^^\np(^A^''~^\si„T\xi..T^ 
j^{k) ^ argmax^lnp ^AjS^'^ylccL.T^ 

In the following, we suppose that sources are white and spatially independant. This 

assumption is not necessary in our approach but we start from here to be able to compare 
later with other classical methods in which this hypothesis is fundamental. 



With this hypothesis, in step {k + 1), the criterion to optimize with respect to Si. r is: 



JiSl..T) = E 



t=l 



\np(x{t)\A^'\s{t))+J2^npj(sj(t)) (9) 
Therefore, the optimisation is done independantly at each time t: 

n 

s{t)^''+^^ ^eiTgmeix{\n p(x{t)\A^''\s) In pj{sj{t))} (10) 



The a posteriori distribution of s is a mixture of 11^=1 Qj Gaussians. This leads to a 
high computational cost. To obtain a more reasonable algorithm, we propose an iterative 
scalar algorithm. The first step consists in estimating each source component knowing 
the other components estimated in the previous iteration: 

s,-(i)M = argmax{lnpfs,(i)|a;(i),l(^),s,^,-(i)(^))} (11) 



The a posteriori distribution of Sj is a mixture of qj Gaussians: Yll=i '^iz'^i^jzi^'iz) 



with: 
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Xi = ^Aii Si 



(13) 



If the means m'j^ aren't close to each other, we are in the case of a multi-modal 
distribution. The algorithm to estimate Sj is to first compute Xi, i — 1,. . . ,m, nij and 



(7| by (13) and then olj^, a'^^ and m^^J, by (12), and select the m^^ for which the ratio ^ 



J2! 



is the greatest one. 

After a full update of all sources Si..t, the estimate of A is obtained by optimizing: 

A^) = ELi In P {x{t)\A, s^+\t)) + In p ( A(t)) + cte (14) 
which is quadratic in elements of A. The gradient has then a simple expression: 



T 

^ = E ^2^^" w (^^(^) - [^^^'(^)] (A. - M,,) (15) 

2 

Cancelling the gradient to zero and defining Aj^^ = we obtain the following 



_ 

relation: 



.t=i 



■A,,,(A,-M,,,) = (16) 



We define the operator Vect transforming a matrix to a vector by the concatenation of 
the transposed rows. Operator Mat is the inverse of Vect. Applying operator Vect to 
relation (16), we obtain the following expression: 

Vect (a;i..T(s?+T)^) + I^Vect(M) = (/x + S*) Vect A (17) 

where is a diagonal matrix (nm x nm) which diagonal vector is Vect{{Aij)i=i„rn,j=i..n) 
and S* the matrix (nm x nm) with block diagonals Sl.tSl.t estimated at iteration 
(k + l). We have finally the explicit estimation of A: 



Mat{[ii + S*]-^ [liVect{M) + Vect {xi^.Tist^Tf)]) (18) 



To show the fais ability of this algorithm, we consider in the following a telecom- 
munication example. For this, we simulated synthetic data with sources described by a 
mixture of 4 Gaussians centered at —3, —1, 1 and 3, with the same variance 0.01 and 

weighted by 0.3, 0.1, 0.4 and 0.2. The unknown mixing matrix is A = ^ ^'^ 

w « ^.u • • . f A. nyr f ^\ A X 150 0.009 \ 

We fixed the a prion parameters of A to: iVi = I ^ ^ j and A = I ^ j , 

meaning that we are nearly sure of diagonal values but we are very uncertain about the 
other elements of A. Noise of variance o"^ = 1 was added to the data. The figure 1 il- 
lustrates the ability of the algorithm to perform the separation. However, we note that 
estimated sources are very centered arround the means. This is because we fixed very 
low values for the a priori variances of Gaussian mixture. Thus, the algorithm is sensi- 
tive to the a priori parameters and exploitation of data is useful. We will see in section 
IV how to deal with this issue. 




Figure 1- Results of separation with QAM-16 (Quadratic Amplitude Modulation) 
using JMAP algorithm: (a) phase space distribution of sources, 
(b) mixed signals, and (c) separated sources 

Now, we are going to re-examine closely the expression for the a posteriori distri- 
bution of sources. It's a multi-modal distribution if the Gaussian means aren't too close. 
The maximum of this distribution doesn't correspond, in general, to the maximum of 
the most probable Gaussian. So, we intend to estimate first, at each time t, the a priori 
Gaussian law according to which the source s{t) is generated (classification) and then 
estimate s{t) as the mean of the a posteriori Gaussian. This leads us to the introduction 
of hidden variables and hierarchical modelization. 



HroDEN VARIABLES 

The a priori distribution of the component Sj is p{sj) = Yll=i^ji^{^ji^'^%)- 
consider now the hidden variable Zj taking its values in the discrete set = (1, . . . , gj) 
so each source can belong to one of the qj sources, with aji —p{zj — i). Given zj — i, 
Sj is normal M{mji,aj-). We can extend this notion to vectorial case by considering the 
vector z = [zi,. . . ,Zn] taking its values in the set Z = W-^^Zj. The s distribution given 
z is a normal lawp(s|z) = /^{raz^T^) with: 



r. = diag«,<,...,a^,J (20) 
The marginal a priori law of s is the mixture of n^^i^j Gaussians: 

p(s) = J]p(z)p(s|z) (21) 

We can re-interpret this mixture by considering it as a discrete set of couples {Mz,p{z)) 
(see Figure 2). Sources which belong to this class of distributions are generated as 
follows: First, generate the hidden variable z e Z according p{z) and then, given this z. 



generate s according to A^. This model can be extended to include continuous values of 
z (also continuous distribution f{z)) and then to take account of infinity of distributions 
in only one class (see Figure 2). 
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Figure 2- Hierarchical modelization with hidden variables 



a posteriori distribution of sources 

In the following, we suppose that mixing matrix is known. The joint law of s, z 

and X can be factorized in two forms: p{s,z,x) = p{x\s)p{s\z)p{z) or p{s,z,x) — 
p{s\x, z)p{z\x)p{x). Thus, the marginal a posteriori law has two forms: 

p(.|a.) = V^i^M^4^ (22) 

or 

p(s|a;) = ^p(2;|a;)p(s|a;,z) (23) 

We note in the second form that the a posteriori is in the same class that of the a priori 
(same expressions but conditionally to x). This is due to the fact that mixture of Gaus- 
sians is a conjugate prior for Gaussian likelihood. Our strategy of estimation is based on 
this remark: The sources are modeled hierarchically, we estimate them hierarchically; 
we begin by estimating the hidden variable using p{z\x) and then estimate sources us- 
ing p{s\x, z) which is Gaussian of mean Oxz'- 

ex^ = m^ + T^A''Yi^{x-Am^) (24) 

and variance V^z'- 

v,, = r,-r,A*R,Ar, (25) 

where. 



il, = (Ar,A* + i2„)-i (26) 



and Rn represent the noise covariance. 

Now we have to estimate z by using p{z\x) which is obtained by integrating the joint 
a posteriori of z and s with respect to s: 

p{z\x) — J p{z,s\x)ds (X p{z) j p{x\s)p{s\z)ds (21) 

The expression to integrate is Gaussian in s. The result is immediate: 

p{z\x)(xp{z) I |~^| 14^ |5 exp[K^^] (28) 

where: 



Kzx = —^{Arriz — xYQxziArriz — x) 

Qxz = (/-i^.A^,A*)i^„l(/-A^,A*R,) + R,A^,A*R, 

If now we consider the whole observations, the law of Zi. t is: 

PiZi„T\Xi..T) OC p{Zi„t) J PiXi„T\Sl..T)piSi„T\Zl..T)dSi„T 

Supposing that z{t) are a priori independant, the last relation becomes: 



Z1..T \ z{t) 



t=l..T 



(29) 



(30) 



p{zi..T\x,..T)^Uj^,i^p{z{t)) J p{x{t)\s{t))p{s{t)\z{t))ds{t)^ (31) 



Estimation of Zi„t is then performed observation by observation: 



argmaxp(2;i..T|a;i..T) = argmaxp(2;(t) |cc(i)) (32) 



Hierarchical JMAP algorithm 

Taking into account of this hierarchical model, the JMAP algorithm is implemented 
in three steps. At iteration (k): 

1. First, estimate the hidden variable zmap (combinatary estimation) given observa- 
tions and mixing matrix estimated in the previous iteration: 



^MAP 



^''\p{t)^eiTgmax{p(z{t)\x{t),A^''-^A} (33) 
z{t) V / 



^(k) 

2. Second, given the estimated ^Vap' source vector s follows Gaussian law 
J\f{6 ^(k) , V ^(fc) ) and then the source estimate is ^(k) . 

3. Third, given the estimated sources P', mixing matrix is evaluated as in the algo- 
rithm of section n. 



We evaluated this algorithm using the same synthetic data as in section 2. Separation 
was robust as shown in Figure 3: 




(a) 



(b) 



(c) 



Figure 3- Results of separation with QAM-16 
using Hierarchical JMAP algorithm: (a) phase space distribution of sources, 
(b) mixed signals, and (c) separated sources 

The Bayesian approach allows us to express our a priori information via paramet- 
ric prior models. However, in general, we may not know the parameters of the a priori 
distributions. This is the task of the next section where we estimate the unknown 
hyperparameters always in a Bayesian framework. 



HYPERPARAMETERS ESTIMATION 

The hyperparameters considered here are the means and the variances of Gaussian 
mixture prior of sources: sj ~ J2l=i^jz-^ (j^jz,:;!!—^, j = 1, We develop, in 

the following, a novel method to extract the hyperparameters from the observations 
Xi._T- The main idea is: conditioned on the hidden variables (^j)i..T = [^i(l); • • -^^jiT)], 
hyperparameters rrijz and ^pj^ for z G Zj — {l,...,qj) axe means and variances of 
a Gaussian distribution. Thus, given the vector (%)i..t = i^ji^): - ■ ■ i^jiT)], we can 
perform a partition of the set T = [1, . . . , T] into sub-sets % as: 

%^{t\zj(t)^z}, zeZj (34) 

This is the classification step. 

Suppose now that mixing matrix A and components Si^j are fixed and we are 
interested in the estimation of rrijz and ipj^. Let 9jz — {rrijz , V'jz)- 

The joint a posteriori law of Sj and Oj^^ given Zj at time t is: 

p{sj, Ojz I X, Zj) oc p{x I s)p{sj I Ojz, Zj)p{ejz I Zj) (35) 

p{sj\ 9jz, Zj) is Gaussian of mean rrijz and inverted variance ipjz. 

p{Ojz I Zj) — p{Ojz) — p{iTijz)p{ipjz) is hyperparameters a priori. The marginal a poste- 



riori distribution of 9jz is obtained from previous relation by integration over Sj: 

p{ejz I X, Zj) oc p{ejz) / p{x I s)p{sj I Ojz, Zj) dsj. (36) 

Jsj 

The expression inside the integral is proportional to the joint a posteriori distribution of 
{sj , Zj) given x and 9jz, thus: 

p{9jz I X, Zj) oc p{djz)p{zj I cc, 6'j2). (37) 

where p{zj \ x, djz) is proportional to a'j^ as defined in expression (12). Noting (f)j — 
1 / a J and t/jj^ = 1 / cr|_^, we have: 



I oc ^ Ji^l- exp [-i ^ - ™,)^ (38) 
Note that the likelihood is normal for means rrijz and Gamma for Xjz — 

{<l>j'^jz)/{<l>j+'tpjz)- 

Choosing a uniform a priori for the means, the estimate of rrijz is: 



^MAP ^ Eterjrijit) ^^^^ 



For variances, we can choose (i) an inverted Gamma prior Q (a, (3) after developing the 
expression for Xj^ knowing the relative order of ^lJjz and (pj (to make Xjz linear in ipj^) or 
(ii) an a prior which is Gamma in Xjz- These choices are motivated by two points: First, it 
is a proper prior which eliminate degenaracy of some variances at zero (It is shown in [5] 
that hyperparameter likelihood (noiseless case without mixing) is unbounded causing a 
variance degeneracy at zero). Second, it is a conjugate prior so estimation expressions 
remain simple to implement. The estimate of inverted variance (first choice when ^jz is 
the same order of 0^) is: 



r (40) 



a 



posteriori 



jz Impost' 



with ftP^^te""" = q; + f and /Jp^^^""" = /3 + ^terM^it) 



Hierarchical JMAP including estimation of hyperparameters 

Including the estimation of hyperparameters, the proposed hierarchical JMAP algo- 
rithm is composed of five steps: 

1. Estimate hidden variables (zj)^^^ by: 

(^j)i^r^ = (argmaxp(2;j | x{t), rrijz , V'jz, A, Si^j))i„T (41) 



which permits to estimate partitions: 



% = {t\{z,f^''{t) = z] {AD 

This corresponds to the classification step in the previous algorithm 

2. Given the estimate of partitions, hyperparameters ^jf^^^ and m^f^^ are updated 
according to equations (39) and (25). The following steps are the same as those in 
the previous proposed algorithm 

3. Re-estimation of hidden variables (%)^^t^ given the estimated hyperparameters. 

4. Estimation of sources (s)?^^^. 

5. Estimation of mixing matrix (A)^^^. 



Simulation results 

To be able to compare the results obtained by this algorithm and the Penalized 
likelihood algorithm developed in the next section with the results obtained by some 
other classical methods, we generated data according to the example described in [3]. 
Data generation: 2-D sources, every component a priori is mixture of two Gaussians 
(±1), ilj = 100 for all Gaussians. Original sources are mixed with mixing matrix A — 

A noise of variance = 0.03 is added (SNR = 15 dB). Number of 



0.4 1 

observations is 1000. 

™^ / 1 \ ^ / 150 0.009 \ ^ f 0.5 0.5 \ 
Parameters: M = ^ ^ l)^^=[ 0.009 150 )^^=[o.5 0.5 J' « = ^00 

and P = 2. 

Initial conditions: A(o) = J ° ^, ^(o) = | | ^, m^") = [J o ) ^^d s^o) 

generated according to sf^ ~ Ylz=i^jzJ^{mfJ , -4^). 

Sources are recovered with negligible mean quadratic error: MEQ(si) — 0.0094 and 
MEQ{s2) = 0.0097. The following figures illustrate separation results: 

The non-negative performance index of [7] is used to chacarterize mixing matrix 
identification achievement: 

ind{S^A-^A) = i 

Figure 7a represents the index evolution through iterations. Note the convergence of 
JMAP algorithm since iteration 30 to a satisfactory value of —A5dB. For the same 
SNR, algorithms PWS, NS [3] and EASI [6] reach a value greater than -35 dB after 
6000 observations. Figures 7b and 7c illustrate the identification of hyperparameters. We 
note the algorithm convergence to the original values (—1 for mn and 100 for ^n). 
In order to validate the idea of data classification before estimating hyperparameters. 
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we can visualize the evolution of classification error (number of data badly classified). 
Figure 7d shows that this error converges to zero at iteration 15. Then, after this iteration, 
hyperparameters identification is performed on the true classified data. Estimation of 
rrijz and i/jjz takes into account only data which belong to this class and then it is not 
corrupted by other data which bring erroneous information on these hyperparameters. 
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Figure 6- Separation results with S'A^i? = 15 dB: Histograms of sources, 
mixed signals and separated sources. 



Figure 7-a- Evolution of index through iterations 



Figure 7-b- Identification of rriu 



Figure 7-c- Identification of t(; 
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Figure 7-d- Evolution of classification error 



Thus, a joint estimation of sources, mixing matrix and hyperparameters is performed 
successfully with a JMAP algorithm. The EM algorithm was used in [4] to solve source 



separation problem in a maximum likelihood context. We now use the EM algorithm 
in a Bayesian approach to take into account of our a priori information on the mixing 
matrix. 



PENALIZED EM 

The EM algorithm has been used extensively in data analysis to find the maximum 
likelihood estimation of a set of parameters from given data [8]. Considering both the 
mixing matrix A and hyperparameters 6, at the same level, being unknown parameters 
and complete data Xi„t and Si..t- Complete data means jointly observed data Xi t 
and unobserved data Si..t. The EM algorithm is executed in two steps: (i) E-step 
(expectation) consists in forming the logarithm of the joint distribution of observed 
data X and hidden data s conditionally to parameters A and and then compute 
its expectation conditionally to x and estimated parameters A and 6 (evaluated in 
the previous iteration), (ii) M-step (maximization) consists of the maximization of the 
obtained functional with respect to the parameters A and 0: 

1. E-step : 

g ( A, 6> I A', d') = E^^, [log p{x,s\ A, 0) \ x, A', d'] (43) 

2. M-step : 

(a, d) = argmaxjg (A, 6 \ A', 6')} (44) 

Recently, in [4], an EM algorithm has been used in source separation with mixture of 
Gaussians as sources prior. In this work, we show that: 

1. This algorithm fails in estimating variances of Gaussian mixture. We proved that 
this is because the degeneracy of the estimated variance to zero. 

2. The computational cost of this algorithm is very high. 

3. The algorithm is very sensitive to initial conditions. 

4. In [4], there's neither an a priori distribution on the mixing matrix A or on the 
hyperparameters 0. 

Here, we propose to extend this algorithm in two ways by: 

1. Introducing an a priori distribution for 6 to eliminate degeneracy and an a priori 
distribution for A to express our previous knowledge on the mixing matrix. 

2. Taking advantage of our hierarchical model and the idea of classification to reduce 
the computational cost. 

To distinguish the proposed algorithm from the one proposed in [4], we call this algo- 
rithm the Penalized EM. The two steps become: 

1. E-step : 

QiA,d\A', e') = E^,, [log p{x,s\A,e)+ log p( A) + log p(0) \x,A!,e'] (45) 



2. M-step : 



(a, e) = argmaxg (A, d \ A', 6') (46) 

The joint distribution is factorized as: p{x, s, A, 6) = p{x \ A, s)p{A)p{s \ 6)p{6). 
We can remark that p{x, s, A, 0) as a function of (A, 6) is separable in A and 9. Con- 
sequently, the functional is separated into two factors: one representing an A functional 
and the other representing a 6 functional: 



Q{A,e\A',e')^Qa{A\A', e') + (6> | a', e') m 



with: 



g„(A|A',6»') = E[\ogp{x\A,s)+\ogp{A)\x,A\e'] 

Qh{0\A',d') = E[iogp{s\e)+iogpie)\x,A',e'] 



(48) 



- Maximisation with respect to A 

The functional Qa is: 

^« = :r^E^ \{x{t)-As{t)f{x{t)-As{t))\x, A', 0'] +logp(A). (49) 
i=i ^ 

The gradient of this expression with respect to the elements of A is: 

dQa 



T /- - \ 1 

2 ( -^xs A Rss I 2 i-^hJ ■ 



(50) 



where: 



I Rss = ^EliE[s{t)s{tr\x,A',d'] 

Evaluation of R.j.s and Rgg requires the computation of the expectations of x{t) 
and s{t) The main computational cost is due to the fact that the expectation of any 
function / (s) is given by: 

E[f{s)\x, A', e']= Yl E[f{s)\x,z^ z', A', 6'] p{z' \ x, A', 6'). (52) 

which involves a sum of YYj=i^ij) terms corresponding to the whole combinations of 
labels. One way to obtain an approximate but fast estimate of this expression is to limit 
the summation to only one term corresponding to the MAP estimate of z: 

E[f{s) I X, A', e'] = E[f{s)\x,z = z^'^'', A', 6'] . (53) 



Then, given estimated labels Zi. r, the source s{t) a posteriori law is Normal with mean 
dj-z and variance V^z given by (24) and (40). 



The source estimate is then d^z- Rxs and Rss become: 



1 ^ 

R,s^-Y.'^{t)s{tf (54) 
t=i 

and 

T T 

Rss = ^^^(t)^(t)^ + i^(A*il„U + r,^)-i (55) 
t=i t=i 

When estimated and using the matrix operations defined in section II and 

cancelling the gradient (50) to zero, we obtain the expression of the estimate of A: 



1^=+^ = Mat 



A + TR* 



-1 



AVect{M)+TVect(R^ 



(56) 



- Maximisation with respect to 6 

With a uniform a priori for the means, maximisation of Qh with respect to rrijz gives 



j :liOjz{t)p{z{t)\x,A', e ') 



With an Inverted Gamma prior Q{a, (3) (a > et /? > 1) for the variances, the 
maximisation of Qh with respect to Ojz gives: 



(Tjz — ; W°) 

EliPi4t)\x,A',e') + 2{a-l) 



Summary of the Penalized EM algorithm 

Based on the preceeding equations, we propose the following algorithm to estimate 
sources and parameters using the following five steps: 

1. Estimate the hyperparameters according to (57) and (58). 

2. Update of data classification by estimating z^^^ . 

3. Given this classification, sources estimate is the mean of the Gaussian a posteriori 
law (39). 

4. Update of data classification. 

5. Estimate the mixing matrix A according to the re-estimation equation (56). 



COMPARISON WITH JMAP ALGORITHM AND ITS 
SENSITIVITY TO INITIAL CONDITIONS 



The Penalized EM algorithm has an optimization cost approximately 2 times higher, 
per sample, than the JMAP algorithm. However, both algorithms have a reasonable 
computational complexity, linearly increasing with the number of samples. Sensitivity 
to initial conditions is inherent to the EM-algorithm even to the penalized version. In 
order to illustrate this fact, we simulated the algorithm with the same parameters as 

in section IV. Note that initial conditions for hyperparameters are ip^^^ = ^ ^ 



1 1 



and 



m 



(0) 



Q Q J . However, the Penalized EM algorithm fails in separating sources (see 
figure 11). We note then that JMAP algorithm is more robust to initial conditions. 
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Figure 1 1- Results of separation with the Penalized EM algorithm: 
(a) Phase space distribution of sources, 
(b) mixed signals and (c) separated sources 

We modified the initial condition to have means : m '^^^ = ( _ ^ ' ^ ^ ' ^ j . We noted, in 

this case, the convergence of the Penalized EM algorithm to the correct solution. Figures 
12-16 illustrate the separation results: 
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Figure 12- Results of separation with the Penalized EM algorithm: 
(a) Phase space distribution of sources, 
(b) mixed signals and (c) separated sources 
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Figure 13- Evolution of classification error Figure 14- Evolution of index 




Iteration Iteration 



Figure 15- Identification of mu Figure 16- Identification of 

CONCLUSION 

We have proposed solutions to source separation problem using a Bayesian framework. 
Specific aspects of the described approach include: 

• Taking account of errors on model and measurements. 

• Introduction of a priori distribution for the mixing matrix and hyperparameters. 
This was motivated by two different reasons: Mixing matrix prior should exploit 
previous information and variances prior should regularize the log-posterior objec- 
tive function. 

We then consider the problem in terms of a mixture of Gaussian priors to develop a 
hierarchical strategy for source estimation. This same interpretation leads us to classify 
data before estimating hyperparameters and to reduce computational cost in the case of 
the proposed Penalized EM algorithm. 
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